TD(λ) and Q-learning based Ludo players

نویسندگان

Majed Alhajry

Faisal Alvi

Moataz Ahmed

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...

متن کامل

A Counterexample to Temporal Differences Learning

Sutton’s TD(λ) method aims to provide a representation of the cost function in an absorbing Markov chain with transition costs. A simple example is given where the representation obtained depends on λ. For λ = 1 the representation is optimal with respect to a least squares error criterion, but as λ decreases towards 0 the representation becomes progressively worse and, in some cases, very poor....

متن کامل

Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning ...

متن کامل

Learning Team Strategies with Multiple Policy-sharing Agents: a Soccer Case Study

We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy but may behave diierently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare two learning algorithms: TD-Q learning with linear neural networks (TD-Q) and Prob...

متن کامل

Differential Eligibility Vectors for Advantage Updating and Gradient Methods

In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Specifically, we use DEV in TD-Q(λ) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergen...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

TD(λ) and Q-learning based Ludo players

نویسندگان

چکیده

منابع مشابه

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

A Counterexample to Temporal Differences Learning

Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

Learning Team Strategies with Multiple Policy-sharing Agents: a Soccer Case Study

Differential Eligibility Vectors for Advantage Updating and Gradient Methods

عنوان ژورنال:

اشتراک گذاری